chore: set ensure_ascii=False for json serialization to preserve unicode chars
#1386
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the langfuse web UI,
outputis encoded in unicode escape sequence (\uXXX). This is the default behavior of the python'sjson.dumps().It's annoying for non-English poeple like me, so set
ensure_ascii=Falseforjson.dumps().Important
Set
ensure_ascii=Falsein JSON serialization to preserve Unicode characters and added tests to verify this behavior.ensure_ascii=Falseinjson.dumps()in_serialize()inattributes.py,_next()and_get_item_size()inscore_ingestion_consumer.py, andpost()inrequest.pyto preserve Unicode characters.test_unicode_serialization.pyto verify Unicode characters are preserved in serialized output.This description was created by
for 500db8d. You can customize this summary. It will automatically update as commits are pushed.
Disclaimer: Experimental PR review
Greptile Overview
Updated On: 2025-10-01 04:39:28 UTC
Summary
This pull request improves Unicode character handling in the Langfuse Python SDK by adding `ensure_ascii=False` to all `json.dumps()` calls throughout the codebase. The change affects three critical serialization points: the OpenTelemetry attributes serializer (`_serialize` function in `attributes.py`), the HTTP client for API requests (`request.py`), and the score ingestion consumer for batch processing (`score_ingestion_consumer.py`).By default, Python's
json.dumps()escapes non-ASCII characters as Unicode sequences (e.g.,\u3053\u3093\u306b\u3061\u306finstead ofこんにちは), making output unreadable for non-English users in the Langfuse web UI. This change preserves Unicode characters in their native form, significantly improving the user experience for international users working with Japanese, Chinese, Korean, Arabic, Russian, and other non-Latin scripts.The PR includes a comprehensive test file (
test_unicode_serialization.py) that validates Unicode preservation across multiple writing systems and emoji. The change is backward-compatible as the resulting JSON remains valid, and the modification is applied consistently across all serialization points to ensure uniform behavior throughout the SDK.Important Files Changed
Changed Files
langfuse/_client/attributes.pyensure_ascii=Falsetojson.dumps()in the_serializefunction used for OpenTelemetry span attributeslangfuse/_utils/request.pyensure_ascii=Falsetojson.dumps()in the HTTP client used for all API requests to Langfuselangfuse/_task_manager/score_ingestion_consumer.pyensure_ascii=Falseto twojson.dumps()calls in the score ingestion batch processing pipelinetests/test_unicode_serialization.pyConfidence score: 5/5
Sequence Diagram
sequenceDiagram participant User participant LangfuseClient as "Langfuse Client" participant EventSerializer as "Event Serializer" participant JSONEncoder as "JSON Encoder" participant APIEndpoint as "API Endpoint" User->>LangfuseClient: "serialize data with unicode content" LangfuseClient->>EventSerializer: "_serialize(data)" EventSerializer->>JSONEncoder: "json.dumps(obj, cls=EventSerializer, ensure_ascii=False)" JSONEncoder-->>EventSerializer: "serialized json string with preserved unicode" EventSerializer-->>LangfuseClient: "unicode-preserved json string" User->>LangfuseClient: "batch_post(**kwargs)" LangfuseClient->>EventSerializer: "json.dumps(kwargs, cls=EventSerializer, ensure_ascii=False)" EventSerializer-->>LangfuseClient: "serialized data with unicode preserved" LangfuseClient->>APIEndpoint: "POST request with unicode content" APIEndpoint-->>LangfuseClient: "response" LangfuseClient-->>User: "response" User->>LangfuseClient: "upload score events" LangfuseClient->>EventSerializer: "serialize events with unicode" EventSerializer->>JSONEncoder: "json.dumps(event, cls=EventSerializer, ensure_ascii=False)" JSONEncoder-->>EventSerializer: "unicode-preserved serialization" EventSerializer-->>LangfuseClient: "serialized events" LangfuseClient->>APIEndpoint: "upload batch with unicode content" APIEndpoint-->>LangfuseClient: "upload response" LangfuseClient-->>User: "upload complete"